PeterMac Data Science’s modified version of material by the University of Cambridge (Mark Dunning, Suraj Menon and Aiora Zabala, Robert Stojnić, Laurent Gatto, Rob Foy, John Davey, Dávid Molnár and Ian Roberts, original material: http://cambiotraining.github.io/r-intro ) and Software Carpentry.
1. Introduction to R and its environment
What is R?
A statistical programming environment suited to high-level data analysis
But offers much more than just statistics
Open source and cross platform
Extensive graphics capabilities
Diverse range of add-on packages
Active community of developers
Thorough documentation
http://www.r-project.org/
R can facilitate Reproducible Research
Statisticians at MD Anderson tried to reproduce results from a Duke paper and unintentionally unravelled a web of incompetence and skullduggery
as reported in the New York Times
Very entertaining talk from Keith Baggerly in Cambridge, December 2010
VIDEO
According to recent editorials, the reproducibility crisis is still on-going
Reality check on reproducibility
1,500 scientists lift the lid on reproducibility
Getting started
In this course we use a server that has everything already installed for you.
If you want to install on your own computer you can get the latest release of R from https://www.r-project.org/
Base package and Contributed packages (general purpose extras)
14094 available packages as of Wed Apr 17 15:29:48 2019
Download from http://mirrors.ebi.ac.uk/CRAN/
Windows, Mac and Linux versions available
Executed using command line, or a graphical user interface (GUI)
On this course, we use the RStudio GUI (www.rstudio.com)
Introduction to RStudio
(from Software Carpentry)
Throughout this lesson, we’re going to teach you some of the fundamentals of the R language. We’ll be using RStudio: a free, open source R integrated development environment. It provides a built in editor, works on all platforms (including on servers) and provides many advantages such as integration with version control and project management.
Basic layout
When you first open RStudio, you will be greeted by three panels:
The interactive R console (entire left)
Environment/History (tabbed in upper right)
Files/Plots/Packages/Help/Viewer (tabbed in lower right)
Once you open files, such as R scripts, an editor panel will also open in the top left.
Workflow within RStudio
There are two main ways one can work within RStudio.
Test and play within the interactive R console then copy code into a .R file to run later.
This works well when doing small tests and initially starting off.
It quickly becomes laborious
Start writing in an .R file and use RStudio’s short cut keys for the Run command to push the current line, selected lines or modified lines to the interactive R console.
This is a great way to start; all your code is saved for later
You will be able to run the file you create from within RStudio or using R’s source() function.
We will use the second way in this course.
Tip: Running segments of your code
RStudio offers you great flexibility in running code from within the editor window. There are buttons, menu choices, and keyboard shortcuts. To run the current line, you can 1. click on the Run button above the editor panel, or 2. select “Run Lines” from the “Code” menu, or 3. hit Ctrl +Return in Windows or Linux or ⌘ +Return on OS X. (This shortcut can also be seen by hovering the mouse over the button). To run a block of code, select it and then Run. If you have modified a line of code within a block of code you have just run, there is no need to reselct the section and Run, you can use the next button along, Re-run the previous region. This will run the previous code block including the modifications you have made.
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIHRvIFIiCmRhdGU6ICJgciBmb3JtYXQoU3lzLnRpbWUoKSwgJyVkICVCICVZJylgIgpvdXRwdXQ6IAogIGh0bWxfbm90ZWJvb2s6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIHRvY19kZXB0aDogMgpzdWJ0aXRsZTogSW50cm8gdG8gUiBhbmQgUlN0dWRpbwotLS0KCipQZXRlck1hYyBEYXRhIFNjaWVuY2UncyBtb2RpZmllZCB2ZXJzaW9uIG9mIG1hdGVyaWFsIGJ5IHRoZSBVbml2ZXJzaXR5IG9mIENhbWJyaWRnZSAoTWFyayBEdW5uaW5nLCBTdXJhaiBNZW5vbiBhbmQgQWlvcmEgWmFiYWxhLCBSb2JlcnQgU3Rvam5pxIcsCiAgTGF1cmVudCBHYXR0bywgUm9iIEZveSwgSm9obiBEYXZleSwgRMOhdmlkIE1vbG7DoXIgYW5kIElhbiBSb2JlcnRzLCBvcmlnaW5hbCBtYXRlcmlhbDogaHR0cDovL2NhbWJpb3RyYWluaW5nLmdpdGh1Yi5pby9yLWludHJvKSBhbmQgU29mdHdhcmUgQ2FycGVudHJ5LioKCiMxLiBJbnRyb2R1Y3Rpb24gdG8gUiBhbmQgaXRzIGVudmlyb25tZW50CgojI1doYXQgaXMgUj8KCiogQSBzdGF0aXN0aWNhbCBwcm9ncmFtbWluZyBlbnZpcm9ubWVudCBzdWl0ZWQgdG8gaGlnaC1sZXZlbCBkYXRhIGFuYWx5c2lzCiogQnV0IG9mZmVycyBtdWNoIG1vcmUgdGhhbiBqdXN0IHN0YXRpc3RpY3MKKiBPcGVuIHNvdXJjZSBhbmQgY3Jvc3MgcGxhdGZvcm0KKiBFeHRlbnNpdmUgZ3JhcGhpY3MgY2FwYWJpbGl0aWVzCiogRGl2ZXJzZSByYW5nZSBvZiBhZGQtb24gcGFja2FnZXMKKiBBY3RpdmUgY29tbXVuaXR5IG9mIGRldmVsb3BlcnMKKiBUaG9yb3VnaCBkb2N1bWVudGF0aW9uCgoKaHR0cDovL3d3dy5yLXByb2plY3Qub3JnLwoKIVtSIHNjcmVlbnNob3RdKGltYWdlcy9SLXByb2plY3QucG5nKQoKIVtOZXcgWW9yayBUaW1lcywgSmFuIDIwMDldKGltYWdlcy9OWVRpbWVzX1JfQXJ0aWNsZS5wbmcpCgoKIyNSIHBsb3R0aW5nIGNhcGFiaWxpdGllcwoKaHR0cHM6Ly93d3cuZmFjZWJvb2suY29tL25vdGVzL2ZhY2Vib29rLWVuZ2luZWVyaW5nL3Zpc3VhbGl6aW5nLWZyaWVuZHNoaXBzLzQ2OTcxNjM5ODkxOQohW1IgZmFjZWJvb2tdKGltYWdlcy9mYWNlYm9vay1uZXR3b3JrLnBuZykKCiMjV2hvIHVzZXMgUj8gTm90IGp1c3QgYWNhZGVtaWNzIQoKaHR0cDovL3d3dy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS9jb21wYW5pZXMtdXNpbmctcgoKLSBGYWNlYm9vawogICAgKyBodHRwOi8vYmxvZy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS8yMDEwLzEyL2FuYWx5c2lzLW9mLWZhY2Vib29rLXN0YXR1cy11cGRhdGVzLmh0bWwKLSBHb29nbGUKICAgICsgaHR0cDovL2Jsb2cucmV2b2x1dGlvbmFuYWx5dGljcy5jb20vMjAwOS8wNS9nb29nbGUtdXNpbmctci10by1hbmFseXplLWVmZmVjdGl2ZW5lc3Mtb2YtdHYtYWRzLmh0bWwKLSBNaWNyb3NvZnQKICAgICsgaHR0cDovL2Jsb2cucmV2b2x1dGlvbmFuYWx5dGljcy5jb20vMjAxNC8wNS9taWNyb3NvZnQtdXNlcy1yLWZvci14Ym94LW1hdGNobWFraW5nLmh0bWwKLSBOZXcgWW9yayBUaW1lcwogICAgKyBodHRwOi8vYmxvZy5yZXZvbHV0aW9uYW5hbHl0aWNzLmNvbS8yMDExLzAzL2hvdy10aGUtbmV3LXlvcmstdGltZXMtdXNlcy1yLWZvci1kYXRhLXZpc3VhbGl6YXRpb24uaHRtbAotIEJ1enpmZWVkCiAgICArIGh0dHA6Ly9ibG9nLnJldm9sdXRpb25hbmFseXRpY3MuY29tLzIwMTUvMTIvYnV6emZlZWQtdXNlcy1yLWZvci1kYXRhLWpvdXJuYWxpc20uaHRtbAotIE5ldyBaZWFsYW5kIFRvdXJpc3QgQm9hcmQKICAgICsgaHR0cHM6Ly9tYmllbnouc2hpbnlhcHBzLmlvL3RvdXJpc21fZGFzaGJvYXJkX3Byb2QvCgogICAgCiMjIFIgY2FuIGZhY2lsaXRhdGUgUmVwcm9kdWNpYmxlIFJlc2VhcmNoCgohW1NpZG5leSBIYXJyaXMgLSBOZXcgWW9yayBUaW1lc10oaW1hZ2VzL1NpZG5leUhhcnJpc19NaXJhY2xlV2ViLmpwZykKCgoKCi0gU3RhdGlzdGljaWFucyBhdCBNRCBBbmRlcnNvbiB0cmllZCB0byByZXByb2R1Y2UgcmVzdWx0cyBmcm9tIGEgRHVrZSBwYXBlciBhbmQgdW5pbnRlbnRpb25hbGx5IHVucmF2ZWxsZWQgYSB3ZWIgb2YgaW5jb21wZXRlbmNlIGFuZCBza3VsbGR1Z2dlcnkKICAgICsgYXMgcmVwb3J0ZWQgaW4gdGhlICoqKk5ldyBZb3JrIFRpbWVzKioqCiAgICAKIVtOZXcgWW9yayBUaW1lcywgSnVseSAyMDExXShpbWFnZXMvcmVwLXJlc2VhcmNoLW55dC5wbmcpCgoKCi0gVmVyeSBlbnRlcnRhaW5pbmcgdGFsayBmcm9tIEtlaXRoIEJhZ2dlcmx5IGluIENhbWJyaWRnZSwgRGVjZW1iZXIgMjAxMAoKPGlmcmFtZSB3aWR0aD0iNTYwIiBoZWlnaHQ9IjMxNSIgc3JjPSJodHRwczovL3d3dy55b3V0dWJlLmNvbS9lbWJlZC83Z1lJczd1WWJNbyIgZnJhbWVib3JkZXI9IjAiIGFsbG93ZnVsbHNjcmVlbj48L2lmcmFtZT4KCkFjY29yZGluZyB0byByZWNlbnQgZWRpdG9yaWFscywgdGhlIHJlcHJvZHVjaWJpbGl0eSBjcmlzaXMgaXMgc3RpbGwgb24tZ29pbmcKCiFbTmF0dXJlLCBNYXkgMjAxNl0oaW1hZ2VzL3JlcC1jcmlzaXMucG5nKQoKCltSZWFsaXR5IGNoZWNrIG9uIHJlcHJvZHVjaWJpbGl0eV0oaHR0cDovL3d3dy5uYXR1cmUuY29tL25ld3MvcmVhbGl0eS1jaGVjay1vbi1yZXByb2R1Y2liaWxpdHktMS4xOTk2MSkKClsxLDUwMCBzY2llbnRpc3RzIGxpZnQgdGhlIGxpZCBvbiByZXByb2R1Y2liaWxpdHldKGh0dHA6Ly93d3cubmF0dXJlLmNvbS9uZXdzLzEtNTAwLXNjaWVudGlzdHMtbGlmdC10aGUtbGlkLW9uLXJlcHJvZHVjaWJpbGl0eS0xLjE5OTcwKQoKCiMjR2V0dGluZyBzdGFydGVkCi0gSW4gdGhpcyBjb3Vyc2Ugd2UgdXNlIGEgc2VydmVyIHRoYXQgaGFzIGV2ZXJ5dGhpbmcgYWxyZWFkeSBpbnN0YWxsZWQgZm9yIHlvdS4KLSBJZiB5b3Ugd2FudCB0byBpbnN0YWxsIG9uIHlvdXIgb3duIGNvbXB1dGVyIHlvdSBjYW4gZ2V0IHRoZSBsYXRlc3QgcmVsZWFzZSBvZiBSIGZyb20gaHR0cHM6Ly93d3cuci1wcm9qZWN0Lm9yZy8KICAgICsgQmFzZSBwYWNrYWdlIGFuZCBDb250cmlidXRlZCBwYWNrYWdlcyAoZ2VuZXJhbCBwdXJwb3NlIGV4dHJhcykKICAgICsgYHIgbGVuZ3RoKFhNTDo6OnJlYWRIVE1MVGFibGUoImh0dHA6Ly9jcmFuLnItcHJvamVjdC5vcmcvd2ViL3BhY2thZ2VzL2F2YWlsYWJsZV9wYWNrYWdlc19ieV9kYXRlLmh0bWwiKVtbMV1dW1syXV0pYCBhdmFpbGFibGUgcGFja2FnZXMgYXMgb2YgYHIgZGF0ZSgpYAotIERvd25sb2FkIGZyb20gaHR0cDovL21pcnJvcnMuZWJpLmFjLnVrL0NSQU4vCi0gV2luZG93cywgTWFjIGFuZCBMaW51eCB2ZXJzaW9ucyBhdmFpbGFibGUKLSBFeGVjdXRlZCB1c2luZyBjb21tYW5kIGxpbmUsIG9yIGEgZ3JhcGhpY2FsIHVzZXIgaW50ZXJmYWNlIChHVUkpCi0gT24gdGhpcyBjb3Vyc2UsIHdlIHVzZSB0aGUgUlN0dWRpbyBHVUkgKHd3dy5yc3R1ZGlvLmNvbSkKCiFbcnN0dWRpb10oaHR0cDovL3d3dy5yc3R1ZGlvLmNvbS93cC1jb250ZW50L3VwbG9hZHMvMjAxNC8wMy9ibHVlLTEyNS5wbmcpIAogICAgCiMjIEludHJvZHVjdGlvbiB0byBSU3R1ZGlvCgoqKGZyb20gU29mdHdhcmUgQ2FycGVudHJ5KSoKClRocm91Z2hvdXQgdGhpcyBsZXNzb24sIHdlJ3JlIGdvaW5nIHRvIHRlYWNoIHlvdSBzb21lIG9mIHRoZSBmdW5kYW1lbnRhbHMgb2YKdGhlIFIgbGFuZ3VhZ2UuIFdlJ2xsIGJlIHVzaW5nIFJTdHVkaW86IGEgZnJlZSwgb3BlbiBzb3VyY2UgUiBpbnRlZ3JhdGVkIGRldmVsb3BtZW50CmVudmlyb25tZW50LiBJdCBwcm92aWRlcyBhIGJ1aWx0IGluIGVkaXRvciwgd29ya3Mgb24gYWxsIHBsYXRmb3JtcyAoaW5jbHVkaW5nCm9uIHNlcnZlcnMpIGFuZCBwcm92aWRlcyBtYW55IGFkdmFudGFnZXMgc3VjaCBhcyBpbnRlZ3JhdGlvbiB3aXRoIHZlcnNpb24KY29udHJvbCBhbmQgcHJvamVjdCBtYW5hZ2VtZW50LgoKCioqQmFzaWMgbGF5b3V0KioKCldoZW4geW91IGZpcnN0IG9wZW4gUlN0dWRpbywgeW91IHdpbGwgYmUgZ3JlZXRlZCBieSB0aHJlZSBwYW5lbHM6CgogICogVGhlIGludGVyYWN0aXZlIFIgY29uc29sZSAoZW50aXJlIGxlZnQpCiAgKiBFbnZpcm9ubWVudC9IaXN0b3J5ICh0YWJiZWQgaW4gdXBwZXIgcmlnaHQpCiAgKiBGaWxlcy9QbG90cy9QYWNrYWdlcy9IZWxwL1ZpZXdlciAodGFiYmVkIGluIGxvd2VyIHJpZ2h0KQoKIVtSU3R1ZGlvIGxheW91dF0oaW1hZ2VzLzAxLXJzdHVkaW8ucG5nKQoKT25jZSB5b3Ugb3BlbiBmaWxlcywgc3VjaCBhcyBSIHNjcmlwdHMsIGFuIGVkaXRvciBwYW5lbCB3aWxsIGFsc28gb3BlbgppbiB0aGUgdG9wIGxlZnQuCgohW1JTdHVkaW8gbGF5b3V0IHdpdGggLlIgZmlsZSBvcGVuXShpbWFnZXMvMDEtcnN0dWRpby1zY3JpcHQucG5nKQoKCiMjIFdvcmtmbG93IHdpdGhpbiBSU3R1ZGlvClRoZXJlIGFyZSB0d28gbWFpbiB3YXlzIG9uZSBjYW4gd29yayB3aXRoaW4gUlN0dWRpby4KCjEuIFRlc3QgYW5kIHBsYXkgd2l0aGluIHRoZSBpbnRlcmFjdGl2ZSBSIGNvbnNvbGUgdGhlbiBjb3B5IGNvZGUgaW50bwphIC5SIGZpbGUgdG8gcnVuIGxhdGVyLgogICAqICBUaGlzIHdvcmtzIHdlbGwgd2hlbiBkb2luZyBzbWFsbCB0ZXN0cyBhbmQgaW5pdGlhbGx5IHN0YXJ0aW5nIG9mZi4KICAgKiAgSXQgcXVpY2tseSBiZWNvbWVzIGxhYm9yaW91cwoyLiBTdGFydCB3cml0aW5nIGluIGFuIC5SIGZpbGUgYW5kIHVzZSBSU3R1ZGlvJ3Mgc2hvcnQgY3V0IGtleXMgZm9yIHRoZSBSdW4gY29tbWFuZAp0byBwdXNoIHRoZSBjdXJyZW50IGxpbmUsIHNlbGVjdGVkIGxpbmVzIG9yIG1vZGlmaWVkIGxpbmVzIHRvIHRoZQppbnRlcmFjdGl2ZSBSIGNvbnNvbGUuCiAgICogVGhpcyBpcyBhIGdyZWF0IHdheSB0byBzdGFydDsgYWxsIHlvdXIgY29kZSBpcyBzYXZlZCBmb3IgbGF0ZXIKICAgKiBZb3Ugd2lsbCBiZSBhYmxlIHRvIHJ1biB0aGUgZmlsZSB5b3UgY3JlYXRlIGZyb20gd2l0aGluIFJTdHVkaW8KICAgb3IgdXNpbmcgUidzIGBzb3VyY2UoKWAgIGZ1bmN0aW9uLgogICAKV2Ugd2lsbCB1c2UgdGhlIHNlY29uZCB3YXkgaW4gdGhpcyBjb3Vyc2UuCgojIyBUaXA6IFJ1bm5pbmcgc2VnbWVudHMgb2YgeW91ciBjb2RlCgpSU3R1ZGlvIG9mZmVycyB5b3UgZ3JlYXQgZmxleGliaWxpdHkgaW4gcnVubmluZyBjb2RlIGZyb20gd2l0aGluIHRoZSBlZGl0b3IKd2luZG93LiBUaGVyZSBhcmUgYnV0dG9ucywgbWVudSBjaG9pY2VzLCBhbmQga2V5Ym9hcmQgc2hvcnRjdXRzLiBUbyBydW4gdGhlCmN1cnJlbnQgbGluZSwgeW91IGNhbiAKMS4gY2xpY2sgb24gdGhlIGBSdW5gIGJ1dHRvbiBhYm92ZSB0aGUgZWRpdG9yIHBhbmVsLCBvciAKMi4gc2VsZWN0ICJSdW4gTGluZXMiIGZyb20gdGhlICJDb2RlIiBtZW51LCBvciAKMy4gaGl0IDxrYmQ+Q3RybDwva2JkPis8a2JkPlJldHVybjwva2JkPiBpbiBXaW5kb3dzIG9yIExpbnV4IApvciA8a2JkPiYjODk4NDs8L2tiZD4rPGtiZD5SZXR1cm48L2tiZD4gb24gT1MgWC4KKFRoaXMgc2hvcnRjdXQgY2FuIGFsc28gYmUgc2VlbiBieSBob3ZlcmluZwp0aGUgbW91c2Ugb3ZlciB0aGUgYnV0dG9uKS4gVG8gcnVuIGEgYmxvY2sgb2YgY29kZSwgc2VsZWN0IGl0IGFuZCB0aGVuIGBSdW5gLgpJZiB5b3UgaGF2ZSBtb2RpZmllZCBhIGxpbmUgb2YgY29kZSB3aXRoaW4gYSBibG9jayBvZiBjb2RlIHlvdSBoYXZlIGp1c3QgcnVuLAp0aGVyZSBpcyBubyBuZWVkIHRvIHJlc2VsY3QgdGhlIHNlY3Rpb24gYW5kIGBSdW5gLCB5b3UgY2FuIHVzZSB0aGUgbmV4dCBidXR0b24KYWxvbmcsIGBSZS1ydW4gdGhlIHByZXZpb3VzIHJlZ2lvbmAuIFRoaXMgd2lsbCBydW4gdGhlIHByZXZpb3VzIGNvZGUgYmxvY2sKaW5jbHVkaW5nIHRoZSBtb2RpZmljYXRpb25zIHlvdSBoYXZlIG1hZGUuCgo=